Z-Algorithm String Matching
Interactive demonstration of linear-time pattern search using the Z-Array.
📝 Problem Statement
Find all occurrences of a **Pattern ($P$)** within a **Text ($T$)**. The Z-Algorithm achieves this by constructing a special array called the **Z-Array** from a concatenated string: $S = P + \$ + T$.
The separator $\$$ is a unique character not found in $P$ or $T$.
Input Section
Concatenated String ($S$):
💡 The Z-Array Concept
Z-Value Definition:
The Z-Array, $Z$, has length $|S|$. $Z[i]$ is the length of the **longest substring** starting at $S[i]$ that is also a **prefix** of $S$.
Pattern Matching via Z-Array:
- **Concatenate**: Create $S = P + \$ + T$.
- **Compute Z-Array**: Calculate $Z[i]$ for all $i$. (Note: $Z[0]$ is usually defined as $|S|$, but is ignored in practice).
- **Find Matches**: A match of $P$ starts in $T$ at index $k$ (which is $M+1+k$ in $S$) if and only if **$Z[M+1+k] = M$**.
The Efficient "Z-Box" Optimization:
The algorithm avoids redundant character comparisons by tracking the **Z-Box**, defined by its left ($L$) and right ($R$) boundaries. This box covers the longest substring starting at $S[L]$ that matches a prefix of $S$, where $L < i \le R$.
- **Case 1 (Outside Z-Box, $i > R$)**: Perform a naive character comparison starting from $S[i]$ to find $Z[i]$. Update $L=i$ and $R=i+Z[i]-1$.
- **Case 2 (Inside Z-Box, $i \le R$)**: Utilize previous information! Let $k = i-L$.
- **Subcase A ($Z[k] < R-i+1$)**: $Z[i]$ must be equal to $Z[k]$. **No comparison needed.**
- **Subcase B ($Z[k] \ge R-i+1$)**: $Z[i]$ is at least $R-i+1$. Perform character comparison starting from $S[R+1]$ to extend $Z[i]$. Update $R$ and $L=i$.
🎬 Step-by-Step Demo
Z-Box Status:
Concatenated String ($S$) & Z-Array ($Z$)
Output Log
Pattern Match Indices (T): None
📊 Algorithm Analysis
Time Complexity
O(N + M)
The Z-Algorithm runs in linear time, which is optimal for string matching. Every character in the string $S$ is compared only a **constant number of times** in total.
Space Complexity
O(N + M)
Required to store the concatenated string $S = P\$T$ and the resulting Z-Array $Z$, both of size $|S| = N+M+1$.
Efficiency Advantage
The Z-Algorithm is simpler to implement than KMP and achieves the same optimal $O(N+M)$ worst-case time complexity. Its efficiency comes entirely from the **Z-Box optimization**, which ensures that the total number of character comparisons over the entire process is linear with respect to the string length.